Open Source
The Surge Capacity Assessment Tool (SCAT) is used to
assess a laboratory’s readiness for surge capacity in an emergency
situation. the SCAT consists of an Excel document in which the
laboratory data is recorded. In case multiple Excel files need to be
analysed and compared, it can be a challenging task to extract all the
data from the list of SCAT Excel files.
This document describes how multiple Excel files can be the imported at
once and subsequently compared and analysed. The code is written in R
and some basic knowledge of the language is required to apply this code.
The code is presented in chunks, which can be read by clicking the
code buttons on the right. As an example, three SCAT
Excel files can be used as dummies. Do not forget to remove
these dummy Excel-files from the import folder before you
extract your own Excel-files.
The code for extracting the information from the Excel files modifies
the structure of the tables. It does not changes any recorded results.
Each SCAT question is seen as a table-variable, with one row for each
laboratory (or Excel-file). The SCAT questions would form very long
variables and are therefore truncated, i.e. “How many additional
manhours would be available to work on the post-analytical phase under
emergency operations?” will be truncated to variable name
man_hrs_an_add. The table below provides an overview of the
modified variables for the SCAT-data collection.
| Variable | Description | Data type | Range | Unit |
|---|---|---|---|---|
| file_name | name of file | Character | ||
| date | Date in the SCAT form | Double | ||
| user_guidelines | unknown | NA | ||
| equipment_1 | Most important equipment, first | Character | ||
| equipment_2 | Most important equipment, second | Character | ||
| equipment_3 | Most important equipment, third | Character | ||
| test_max_day | Maximum nr of mol. tests per day | Double | Numeric | |
| operating_hrs | Lab operating hrs per day, normal | Double | (1-24) | hours |
| work_hrs | Number of hrs per day that staff works | Double | (1-24) | hours |
| man_hrs_pre | Number of hrs per day pre-analytic phase | Double | (1-24) | hours |
| man_hrs_an | Number of hrs per day analytic phase | Number | (1-24) | hours |
| man_hrs_post | Number of hrs per day post-analytic phase | Number | (1-24) | hours |
| equipment_[1/2/3]_normal | Equipment 1/2/3 running hrs per day, normal operations | Double | (1-24) | hours |
| occupation_[1/2/3]_normal | Equipment 1/2/3 occupation per day, normal operations | Double | Percentage | |
| operating_hrs_max | Lab operating hrs per day, during emergency | Double | (1-24) | hours |
| work_hrs_max | Number of hrs per day that staff works, during emergency | Double | (1-24) | hours |
| man_hrs_pre_add | Number of hrs per day pre-analytic phase, emergency | Double | (1-24) | hours |
| man_hrs_an_add | Number of hrs per day analytic phase, emergency | Double | (1-24) | hours |
| man_hrs_post_add | Number of hrs per day post-analytic phase, emergency | Double | (1-24) | hours |
| equipment_[1/2/3]_emergency | Additional equipment 1, emergency | Double | ||
| occupation_[1/2/3]_emergency | Maximum occupation equipment 1, emergency | Double | Percentage | |
| max_nr_tests | Maximum number of mol. tests, emergency | Double | ||
| limiting_factor | Factor limiting 24 hr surge capacity | Character | ||
Several basic R-libraries are used and loaded.
rm(list = ls()) ## remove all data from global environment.
## Set working directory
#setwd("C:/Users/hovetr/Documents/SCAT/SCAT_R") Not needed with Markdown, but perhaps when using R-Project)
## Load libraries
library(DT)
library(htmlwidgets)
library(tidyverse)
library(janitor)
library(kableExtra)
library(plotly)This code contains a few functions that come of use. The function
round_df enables to round-off all ‘numbers’ within a
data-frame to a given number,
i.e. round_df(penguins, 2).
The function create_dt is based on the
DT library and creates an interactive table of a
data-frame.
Important Note: applying this function for a Markdown
HTML document, will incorporate all data from the set within the HTML
file.
#### rounding numbers
round_df <- function(df, digits) {
nums <- vapply(df, is.numeric, FUN.VALUE = logical(1))
df[,nums] <- round(df[,nums], digits = digits)
(df)
}
#### Create Data table. -----
create_dt <- function(x){
DT::datatable(x, class = 'cell-border stripe', filter = 'top',
extensions = 'Buttons',
options = list(dom = 'Blfrtip',
buttons = c('copy', 'csv', 'excel', 'pdf', 'print'),
lengthMenu = list(c(10,25,50,-1),
c(10,25,50,"All"))))
}From the the folder import, all available Excel
files with he extension .xlsm are imported into the
data-frame scat_df. The data is retrieved only from the tab
24h Surge Calculator. The data from the different Excel
file are merged together in one data frame. The first column with the
name “Value” contains the name of the Excel file.
scat_df <- list.files(path = "input", # Change the name of the path corresponding to the dedicated folder name
full.names = TRUE,
recursive = TRUE,
pattern = "*.xlsm") %>% # Extension name is .xlsm (not .xlsx)
tbl_df() %>%
mutate('24h Surge Calculator' = map(value, readxl::excel_sheets)) %>%
unnest('24h Surge Calculator') %>%
rename(sheet = '24h Surge Calculator') %>% # Column names starting with a number are always causing trouble; rename it to 'sheet'
filter(sheet == "24h Surge Calculator") %>%
mutate(myFiles = purrr::map2(value, sheet, function(x,y) {
readxl::read_excel(x, sheet = paste(y))})) %>%
unnest(myFiles) The initial data-frame is a bit scrambled. This code chunk is cleaning up the data-frame and makes it more readable.
scat_df <- scat_df %>%
select(-...3) # remove column containing only NA
#The date is imported as a number with class character
scat_df$...5 <- as.numeric(scat_df$...5) # column is changed to class numeric
scat_df$...5 <- excel_numeric_to_date(scat_df$...5) #using function from janiter, excel number is changed to data
#Change column names: makes is easier to work
scat_df <- scat_df %>%
rename(Question = `CoE Project 81 - 24h Surge Capacity Calculator`) %>%
rename(Answer = `...2`) %>%
rename(var4 = `...4`) %>%
rename(var5 = `...5`)
# remove rows if columns 3-6 all contain NA
scat_df <- scat_df[rowSums(is.na(scat_df[,3:6]))!=4,]
# 1st part: move 'date: from column var4 to column 'Question'
# 2nd part: change date to character, then copy (if it starts with 4 digits) to column 'Question'
move_date <- function(x) {
x <- mutate(x, Question =
ifelse(grepl('Date:', x$var4), "Date:", x$Question)
)
x$var5 <- as.character(x$var5)
x <- mutate(x, Answer =
ifelse(grepl('[[:digit:]]{4}', x$var5), x$var5, x$Answer)
)
}
scat_df <- move_date(scat_df)
# Working with NA in ifelse, is a little bit different.
# Move date from column var to column Answer, if column Answer contains NA
scat_df$Answer <- ifelse(is.na (scat_df$Answer),
scat_df$var4,
scat_df$Answer)
# Drop last two columns that aren't used anymore
scat_df <- scat_df %>%
select(-var4, -var5)
# Delete rows with long text-strings
scat_df <- scat_df %>%
filter(!grepl('^This part of the tool', Question))
scat_df <- scat_df %>%
filter(!grepl('^Before answering the', Question)) This code-chunk extracts the City-name (or laboratory name) from the name of the file and adds it to the data-frame as a separate value. It is therefore important that the names of the Excel files in the folder are kept consistently.
scat_df$city <- sub(".*input/P81 BIOSEC_SCAT_*(.*?) *.xlsm*", "\\1", scat_df$value)
tbl_city <- unique(scat_df$city)
print(tbl_city)## [1] "Bangkok" "Chumpon" "Udon_Thani"
Step one. The problem is that in Excel, names of equipment are merged
in the Questions: the same question would then be seen as different
variables / columns. The first step is to make the questions uniform
between the different SCAT Excel files.
For example the question: “How many LightCycler 480 PCR are used for
molecular testing under normal operations?” need to be converted
removing the part ‘LightCycler 480 PCR’.
scat_df <- scat_df %>%
group_by(value) %>% ## cut the dataset in chunks by Excel file-name (=value)
mutate(ticker = row_number()) # add column with row number, which repeats for each 'file name'
scat_df <- ungroup(scat_df) ## undo grouping
## Rename First important equipment normal situation
scat_df <- scat_df %>%
mutate(Question =
ifelse(grepl('How many.+are used for molecular testing under normal operations\\?', Question) & (ticker == 13), "equipment_1_normal", scat_df$Question))
scat_df <- scat_df %>%
mutate(Question =
ifelse(grepl('What is the occupation of.+under normal operations\\?', Question) & (ticker == 14), "occupation_1_normal", scat_df$Question))
## Rename Second important equipment normal situation
scat_df <- scat_df %>%
mutate(Question =
ifelse(grepl('How many.+are used for molecular testing under normal operations\\?', Question) & (ticker == 15), "equipment_2_normal", scat_df$Question))
scat_df <- scat_df %>%
mutate(Question =
ifelse(grepl('What is the occupation of.+under normal operations\\?', Question) & (ticker == 16), "occupation_2_normal", scat_df$Question))
## Rename Third important equipment normal situation
scat_df <- scat_df %>%
mutate(Question =
ifelse(grepl('How many.+are used for molecular testing under normal operations\\?', Question) & (ticker == 17), "equipment_3_normal", scat_df$Question))
scat_df <- scat_df %>%
mutate(Question =
ifelse(grepl('What is the occupation of.+under normal operations\\?', Question) & (ticker == 18), "occupation_3_normal", scat_df$Question))
## Rename First important equipment emergency situation
scat_df <- scat_df %>%
mutate(Question =
ifelse(grepl('How many.+can be used additionally during emergency operations\\?', Question) & (ticker == 25), "equipment_1_emergency", scat_df$Question))
scat_df <- scat_df %>%
mutate(Question =
ifelse(grepl('What is the maximum occupation of.+\\?', Question) & (ticker == 26), "occupation_1_emergency", scat_df$Question))
## Rename Second important equipment emergency situation
scat_df <- scat_df %>%
mutate(Question =
ifelse(grepl('How many.+can be used additionally during emergency operations\\?', Question) & (ticker == 27), "equipment_2_emergency", scat_df$Question))
scat_df <- scat_df %>%
mutate(Question =
ifelse(grepl('What is the maximum occupation of.+\\?', Question) & (ticker == 28), "occupation_2_emergency", scat_df$Question))
## Rename Third important equipment emergency situation
scat_df <- scat_df %>%
mutate(Question =
ifelse(grepl('How many.+can be used additionally during emergency operations\\?', Question) & (ticker == 29), "equipment_3_emergency", scat_df$Question))
scat_df <- scat_df %>%
mutate(Question =
ifelse(grepl('What is the maximum occupation of.+\\?', Question) & (ticker == 30), "occupation_3_emergency", scat_df$Question))
scat_df <- scat_df %>%
select(-ticker)Step two. The table is pivoted with one lab per row. The SCAT ‘Questions’ now become the Variable names. Most variable names (= Questions) become pretty long. Therefore, they need to be shortened. The new applied variable names are listed and described in the table in the Introduction.
# Pivoting the table
scat_df_wide <- scat_df %>% pivot_wider(
names_from = Question,
values_from = Answer,
values_fill = NA
)
#Change column names
scat_df_wide <- scat_df_wide %>%
rename(file_name = value) %>%
rename(date = `Date:`) %>%
rename(user_guidelines = `User guidelines`) %>%
rename(equipment_1 = `1.`) %>%
rename(equipment_2 = `2.`) %>%
rename(equipment_3 = `3.`) %>%
rename(normal_situation = `Question regarding a normal situation`) %>%
rename(test_max_day = `What is the maximum number of molecular tests the laboratory can perform per day?`) %>%
rename(operating_hrs = `What are the operating hours of the laboratory?`) %>%
rename(work_hrs = `How many hours do staff work per day?`) %>%
rename(man_hrs_pre = `How many manhours per day does the pre-analytic phase take in a normal situation performing the maximum amount of molecular tests?`) %>%
rename(man_hrs_an = `How many manhours per day does the analytic phase take in a normal situation performing the maximum amount of molecular tests?`) %>%
rename(man_hrs_post = `How many manhours per day does the post-analytic phase take in a normal situation performing the maximum amount of molecular tests?`) %>%
rename(emergency_situation = `Question regarding an emergency situation`) %>%
rename(operating_hrs_max = `What are the maximum operating hours of the laboratory during emergency operations?`) %>%
rename(work_hrs_max = `How many hours per day can staff work during emergency operations?`) %>%
rename(man_hrs_pre_add = `How many additional manhours would be available to work on the pre-analytical phase under emergency operations?`) %>%
rename(man_hrs_an_add = `How many additional manhours would be available to work on the analytical phase under emergency operations?`) %>%
rename(man_hrs_post_add = `How many additional manhours would be available to work on the post-analytical phase under emergency operations?`) %>%
rename(current_nr_tests = `Current maximum 24 hour surge capacity`) %>%
rename(max_nr_tests_emergency = `The maximum number of molecular tests the laboratory can perform per day under emergency operations`) %>%
rename(limiting_factor = `Factor limiting 24 hour surge capacity`)The data frame is exported back into an standard csv file into the
folder output.
write_csv(scat_df_wide, "./output/scat_complete.csv")The data-set can also be presented as an html interactive table. The buttons can be pressed to download the data in different formats. A subset is prepared for this table, otherwise it becomes too long.
scat_sub <- scat_df_wide %>%
select(file_name, city, test_max_day, max_nr_tests_emergency)
scat_sub$city <- as.factor(scat_sub$city)
scat_sub$test_max_day <- as.integer(scat_sub$test_max_day) # change variable type character to type integer
scat_sub$max_nr_tests_emergency <- as.integer(scat_sub$max_nr_tests_emergency)
scat_sub <- round_df(scat_sub, 0)
tbl_1 <- create_dt(scat_sub)
tbl_1saveWidget(tbl_1, "./output/tbl_1.html") ## Export interactive table 1 to folder output
tests <- scat_sub %>%
pivot_longer(
cols = test_max_day:max_nr_tests_emergency,
names_to = "stage",
values_to = "tests_nr"
)
p1 <- plot_ly(data = tests, x = ~tests_nr , y = ~city, orientation = 'h', color = ~stage) %>%
add_trace(type = "bar") %>%
layout(
title = "Number of tests per day in normal stage and during emergency.",
xaxis = list(title = "Number of tests"),
yaxis = list(title = "City name")
)
p1saveWidget(p1, "./output/plot1.html") ## Export interactive plot 1 to folder outputThe table below lists an overview of the three most important equipment used for performing the molecular tests in each laboratory. The last column contains the factor limiting the 24 hour surge capacity.
equipment <- scat_df_wide %>%
select(city, equipment_1, equipment_2, equipment_3, limiting_factor)
tbl2 <- head(equipment) %>%
kable() %>%
kable_styling(position = "left", full_width = FALSE) %>%
column_spec(1, bold = TRUE, border_right = TRUE, color = "black", background = "lightgrey")
tbl2| city | equipment_1 | equipment_2 | equipment_3 | limiting_factor |
|---|---|---|---|---|
| Bangkok | LightCycler 480 PCR | BSL-2 safety cabinet | MagNA pure RNA extraction | Additional personnel Pre-Analytical phase or longer working days |
| Chumpon | Biorad CFX Opus 96 | Qiagen Easy2 NA extraction | BSL2 Biosafety cabinet | Biorad CFX Opus 96 occupation or Additional Biorad CFX Opus 96 |
| Udon_Thani | Magnapure NA extraction | Biorad CFX Opus 96 | BSL2 Biosafety cabinet | Additional Magnapure NA extraction |
save_kable(tbl2,
"./output/tbl2.html",
bs_theme = "simplex",
self_contained = TRUE,
extra_dependencies = NULL,
latex_header_includes = NULL,
keep_tex = FALSE,
density = 300
)Possible error codes.
Line 97 Error in 'mutate()':
! Problem while computing '25h Surge Calculator = map(value,
You may have opened one or more an Excel files.
Solution: close all Excel files before running code.